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Abstract 

We propose a Markov chain simulation method to generate simple connected ran- 
dom graphs with a specified degree sequence and level of clustering. The net- 
works generated by our algorithm are random in all other respects and can thus 
serve as generic models for studying the impacts of degree distributions and clus- 
tering on dynamical processes as well as null models for detecting other structural 
properties in empirical networks. 



1. Introduction 

Complex networks such as those formed by the links of 
the World Wide Web, social contacts between individ- 
uals in a city, and transportation routes have received 
much attention in the last decade. Recent studies have 
sought to characterize and explain non-trivial structural 
properties such as heavy-tail degree distributions, clus- 
tering, short average path lengths, degree correlations 
and community structure. These properties appear in di- 
verse natural and manmade systems, and can fundamen- 
tally influence dynamical processes on these networks 

Clustering, a property describing the presence of trian- 
gles in a network, is an important topological character- 
istic that can significantly impact dynamical processes 
over complex networks gll E3 [34] ESI EQj H3]. It is 
often correlated with local graph properties such as cor- 
relations in the number of edges emanating from neigh- 
boring vertices (341, as well as global properties such as 
motifs OTl [39l and community structure Q . 

Random graphs are graphs that are generated by some 
random process [26|. They are widely used as models of 
complex networks l29l and can assume various levels of 
complexity. The simplest model for generating random 
graphs, with only a single parameter, is the Bernoulli 
or Erdos-Renyi random graph model, which produces 
graphs that are completely defined by their average de- 
gree and are random in all other respects. A slightly 
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more complex and general model is one that generates 
graphs with a specified degree distribution (or degree 
sequence) and are random in all other respects. These 
models can be extended to include additional structural 
constraints, such as degree correlations or the density of 
triangles or longer cycles. Here, we define a a random 
graph model which is constrained by the node degree 
distribution and the density of triangles in the graph. 

1.1. Clustering in Real Networks 

Clustering in real networks can stem from two sources: 
(a) it can arise as a byproduct of other, more fundamen- 
tal, topological properties such as the degree sequence 
(distribution) or degree correlations; or (b) it can be 
generated directly by some inherent property or mecha- 
nism within the system, for example, "the friends of my 
friends tend to become my friends" in social networks. 

Some researchers have claimed that high clustering is 
a general feature of complex networks 1341 . When we 
measured clustering in a variety of empirical technolog- 
ical, biological and social networks, however, we found 
that it varies considerably. Table 1 shows that the clus- 
tering coefficients and transitivity values (a local and 
global measure of clustering, respectively) for these net- 
works span the entire range of possible values (zero to 
one). Thus, it is important to understand not only the 
origins of clustering, but also the impact of clustering 
on network functions and dynamics. Towards this end, 
we introduce a method for generating random networks 
with a specified level of clustering. 
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Empirical Network 


N 


< d > 


< d 2 > 


C 


T 


C 


f 


Vancouver Urban Contacts 


2627 


13.9 


265 


0.07 


0.09 


0.09 


0.14 


WWW Subgraph 


4271 


4.2 


119 


0.09 


0.01 


0.15 


0.08 


Yeast Protein Interactions 


4713 


6.3 


152 


0.13 


0.06 


0.14 


0.18 


Astro-Phys Collaborations 


5973 


4.1 


35 


0.51 


0.38 


0.60 


0.62 


US Air Traffic 


165 


38.0 


2765 


0.86 


0.58 


0.97 


0.96 



Table 1: The number of nodes (N), the average node degree (< d >), the mean-squared of node degree (< d 2 >), 
clustering coefficient (C), transitivity (T), Soffer-Vasquez clustering coefficient (C), and Soffer-Vasquez transitivity 
(T) for a set of empirical networks. 



1.2. Previous Work & Motivation 

The study of clustering in complex networks began with 
the seminal work of Watts and Strogatz RTI . The au- 
thors presented a graph model with high clustering and 
low average path length, now known as the small-world 
property. Although not intended as a generative algo- 
rithm for clustered graphs, the model produces graphs 
with clustering spanning the range from to 1. The 
graphs generated under this model, however, have rigid 
spatial structure and cannot accommodate varying de- 
gree distributions. 

The first algorithms to explicitly generate graphs with a 
specified level of clustering for arbitrary degree distribu- 
tions belonged to the class of projected bipartite graphs. 
Newman l25l introduced a three-step method that first 
builds a bipartite graph of individuals and affiliations, 
then projects the bipartite graph to a unipartite graph 
of individuals only, and finally runs a percolation pro- 
cess over the unipartite graph. This results in a clus- 
tered graph with a degree distribution that depends on 
the original distributions of numbers of individuals per 
group and groups per individual. The level of cluster- 
ing in the final graph varies smoothly from to 1 as a 
function of the percolation probability. In ITT1 . Guil- 
laume suggested a similar bipartite graph approach. Al- 
though these approaches can generate clustered graphs 
with diverse degree distributions, they lack straightfor- 
ward methods for choosing parameters that yield graphs 
with not only a pre-specified clustering coefficient but 
also a pre-specified degree distribution. These algo- 
rithms also tends to produce disconnected graphs that 
leave a significant proportion of the graph vertices iso- 
lated. 

A second class of clustered graph models use "growing 
network" algorithms 11331 l40l l37l . The inputs to these 
models are a degree distribution and level of clustering. 
The method begins with a set of vertices with no edges; 



the graph is then "grown" by adding edges based on the 
degree and clustering constraints. Although the algo- 
rithms of this class allow for arbitrary degree distribu- 
tions and levels of clustering, they either require a com- 
plex implementation 1331 . produce graphs of a highly 
specific structure 11371 or introduce large amounts of de- 
gree correlations l37ll40l . 

Here, we present a model that generates simple and con- 
nected graphs with prescribed degree sequences and a 
specified frequency of triangles, while maintaining a 
graph structure that is as random (uncorrected) as possi- 
ble. There is an important difference between our model 
and previous work in the area. Prior models were in- 
tended to generate clustered graphs that replicate the 
properties of real-world networks; our goal is to gen- 
erate a class of null networks with arbitrary degree dis- 
tributions that are simple and connected and have a high 
density of triangles, but are random in all other respects. 

Such a method is useful for two primary reasons: First, 
network structure fundamentally influences the func- 
tions of and dynamical processes on networks. We can 
use random clustered graphs to study the consequences 
of clustering, both independently and in combination 
with various degree patterns. Second, these networks 
can serve as null models for detecting whether an empir- 
ical network can be boiled down to its degree distribu- 
tion and clustering values or, instead, contains substan- 
tial degree correlations or other important structures (be- 
yond the byproducts of the degree distribution and clus- 
tering). One would first use the algorithm to generate 
an ensemble networks that match the empirical degree 
distribution and clustering values, and then compare the 
structural, functional, or dynamical properties of the em- 
pirical network to those of the random networks. 

In Section 2, we review common measures of clustering 
and introduce our Markov chain model and algorithm 
for generating clustered graphs with a specified degree 
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sequence. In Section 3, we test our algorithm with nu- 
merical simulations and discuss the structural properties 
of the generated graphs. Finally, in Section 4, we use the 
generated graphs to detect deviation from randomness in 
empirical networks. 

2. Methods and Model 

Our random graph generation method begins with a ran- 
dom graph and iteratively rewires edges to introduce tri- 
angles. Network rewiring is a well-known method for 
generating networks with desired properties |[T9l . Two 
edges are called adjacent if they connect to a common 
node. Each rewiring is performed on two non-adjacent 
edges of the graph and consists of removing these two 
edges and replacing them with another pair of edges. 
Specifically, a pair of edges and (k, I) is replaced 
with either (i, k) and (j, I), or (i, I) and (j, k) (as illus- 
trated in Figure lc). This change in the graph leaves 
the degrees of the participating nodes unchanged, thus 
maintaining the specified degree sequence. Below we 
describe a rewiring algorithm that increases the level of 
clustering in a random graph, while preserving the de- 
gree sequence. 

2.1. Measures of Clustering 

We begin with a graph G = (V, E) which is undirected 
and simple (no self-loops or multiple edges). V is the 
set of vertices of G and E is the set of the edges. We 
let N = \V\ and M = \E\ denote the number of nodes 
and edges in G, respectively. The degree of a node i will 
be denoted d,-. The set of degrees for all nodes in the 
graph makes up the degree sequence, which follows a 
probability distribution called the degree distribution. 

Clustering is the likelihood that two neighbors of a given 
node are themselves connected. In terms of social net- 
works, it measures the probability that "the friend of my 
friend is also my friend." In topological terms, clus- 
tering measures the density of triangles in the graph, 
where a triangle is the existence of the set of edges 
(i,j),(i,k),(j,k) between any triplet of nodes i,j,k 
(Figure lb). 

To quantify the local presence of triangles, we define 
S(i) as the number of triangles in which node i partici- 
pates. Since each triangle consists of three nodes, it is 
counted thrice when we sum S(i) for each node in the 
graph. Thus the total number of triangles in the graph is 
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by edges (i,j) and (j, k), regardless of the existence of 
the edge (i,k) (Figure la). The number of triples of 



node i is simply r(i) = I 2 j assumm g C U > 2. To 
compute the total number of triples in the graph we sum 

r(i): 

r(G) = 5>(i). 
iev 

The term triadic closure refers the conversion of a triple 
into a triangle via the addition of a third edge [INSERT 
REFS]. 

The clustering coefficient was introduced by Watts and 
Strogatz fiTll as a local measure of triadic closure. For a 
node i with di > 2, the clustering coefficient c(i) is the 
fraction of triples for node i which are closed, and can 
be measured as 5(i)/r(i). The clustering coefficient of 
the graph is then given by: 



C(G) 



- E 



c(i), 



{ieV\di>2} 



where N2 is the number of nodes with di > 2. 

A more global measure of the presence of triangles is 
called the transitivity of graph G and is defined as: 



T(G) = 



36(G) 
r(G)- 



Although they are often similar, T(G) and C(G) can 
vary by orders of magnitude 1351 . They differ most 
when the triangles are heterogeneously distributed in the 
graph. 

These traditional measures of clustering are degree- 
dependent and thus can be biased by the degree sequence 
of the network. The maximum number of possible tri- 
angles for a given node i is just its number of triples 
(r(i)). For a node which is connected to only low degree 
neighbors, however, the maximum number of possible 
triangles may be much smaller than r(i). To account for 
this, a new measure for clustering was introduced in l35l 
that calculates triadic closure as a function of degree and 
neighbor degree. Specifically, the Soffer-Vasquez clus- 
tering coefficient (C) and transitivity (T) are given by: 



C = 



T,i\w i >0 5 ( i )/ UJ ( i ) 



T 



A triple is a set of three nodes, i, j, k that are connected 



where ui(i) measures the number of possible triangles 
for node i, and N u is the number of nodes in G for 
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Figure 1: (a) a triple among the nodes i, j, k (b) a triangle among the nodes i, j, k (c) A rewiring of edges (i, j) and 
(fc, I) can result in (i, k) and (j, I), or (i, I) and j, fc) (d) Four (among many) scenarios for the result of one rewiring 
step of our algorithm. The configuration of edges before (left) and after (right) a rewiring step are shown for each 
scenario. The two bottom scenarios would be rejected by our algorithm as they do not strictly increase the number of 
triangles. 



which ui(i) > 0. We note that C and T are undefinited 
if ui{G) = J2i = 0- i s computed by counting 
the maximum number of edges that can be drawn among 
the di neighbors of a node i, given the degree sequence 

of i's neighbors; this value is often smaller than ( ^ j 

ll35ll . For example, consider a star network of five nodes, 
where four nodes have degree 1 and one node has degree 
4. Although the total number of triples is r(G) = 6, the 
number of possible triangles is ui(G) = because the 
degree one nodes preclude their formation. 

2.2. Generating Random Graphs 

Generating random graphs uniformly from the set of 
simply connected graphs with a prescribed degree se- 
quence is a well-studied problem with algorithmic so- 
lutions If!!?! . One of the simplest and most popular of 
these generative algorithms was originally suggested by 
Molloy and Reed l20l . Their model, however, some- 
times produces graphs that are not simple or connected. 
This can be remedied by subsequently removing mul- 



tiple edges and self loops from the constructed graph 
and keeping only the largest connected component. Al- 
though this approach works, the Markov Chain Monte 
Carlo (MCMC) method for generating simple connected 
graphs with specified realizable degree sequences (Q~l|6l 
presented in J9] [19) is less prone to problems. It pro- 
ceeds as follows: 

1. Create a graph with the desired degree sequence 
using the deterministic Havel-Hakimi algorithm. 

2. Connect any disconnected components of the graph 
using the Taylor algorithm |36|. 

3. Randomly rewire the graph while keeping it simple 
and connected |fl9l . 

The Havel-Hakimi algorithm is iterative and tracks the 
residual degree of each node, which is the difference be- 
tween its current degree and desired degree. In each 
iteration, it picks an arbitrary node x and adds edges 
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Figure 2: Possible triangle additions (green) and removals (red) in one step of the rewiring procedure. Black lines 
represent existing edges and edges added after a rewiring event, gray lines represent edges lost during a rewiring 
event. 



from x to d x other nodes with the highest residual de- 
grees, where d x is the degree of x. The residual de- 
grees of all the nodes are then updated. The Taylor al- 
gorithm merges disconnected components of a graph by 
randomly selecting edges (i, j) and (k, I) from different 
components of the graph and rewiring them to (i, k) and 
(j, I), as long as the rewiring does not create new dis- 
connected components. 

2.3. Markov Chain Model 

Our method of generating clustered graphs can be de- 
scribed by a Markov chain. We let D be a realizable 
degree sequence and define Gd to be the set of all 
simple, connected graphs with degree sequence D. If 
Gi, G2, G\Gn>\ we tne g ra P ns °f Gd, then we let 
X\, X2, X\G n \ be the states of the Markov chain, 
P, where Xi represents the state in which our graph 
G = Gi. The states Xi and Xi + \ are connected in 
the Markov Chain if G, can be changed to Gj+i with 
the rewiring of one pair of edges. The state space of the 
Markov chain P is connected because there exists a path 
from Xi to Xj (for any pair i, j) by one or more rewiring 
moves that leave the degree sequence unchanged [36). 



Our clustered graph generation algorithm involves first 
obtaining a graph, G of Gd by the method outlined in 
Section 2.2, and then transitioning from the state cor- 
responding to G (Xq) to other states of P until a halt- 
ing condition is reached. A transition from one state of 
the Markov chain to another only happens when the al- 
gorithm makes an edge rewiring that both increases the 
number of triangles in the graph and leaves the graph 
connected. Since a rewiring does not alter the degree 
sequence of the graph, the rewired graph is still in Gd ■ 
The transition probabilities of the Markov chain for a 
pair of connected states, Xi to Xj, are: 

p _ ( 1 if (Aj — Aj) > and Gj is connected 
13 1 otherwise 

where Aj is the number of triangles in Gi. The algo- 
rithm continues searching for a feasible rewiring (one 
that increases the number of triangles and does not dis- 
connect the graph) until one is found. If a feasible move 
is not found, a transition is not made and the process 
remains in the current state. 

The Markov chain above is finite and aperiodic, but not 
irreducible as the process can never transition to a state 
in which the graph has fewer triangles. It does, however, 
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have an absorbing state, X*, in which the transitivity of 
G* is greater than or equal to the desired transitivity or 
is the maximum possible transitivity given the particular 
degree sequence and connectivity constraints. 

2.4. Algorithm 

Our Markov Chain simulation algorithm for generating 
clustered random graphs is described below and illus- 
trated in Figure Id. 

Input: A random graph, G, with a realizable degree 
sequence {di}, generated using the method 
outlined in Section 2.2 or another suitable 
method, a desired clustering value, target, 
and a tolerance value TOL. 

Initialization: Measure the clustering of G, 
clust{G). 

while \clust(G) — target] > TOL do 

1. 
2. 



3. 



4. 
5. 



6. 



7. 

I 



end 

Output: A random graph, G with degree sequence 
{di} and clust(G) — target±TOL. 

The algorithm terminates when the desired clustering 



(within a given tolerance) or the maximum clustering 
possible is reached. In the latter case, the desired clus- 
tering is not achieved given the degree and connectiv- 
ity constraints. Theoretically, the algorithm may never 
reach the target, but if it does, the answer is guaranteed 
to be correct (this is also sometimes known as a Las Ve- 
gas type algorithm). For practical implementation pur- 
poses, a threshold can be placed on the number of iter- 
ations run by the algorithm in the case that the desired 
clustering cannot be reached. 



2.4.1 Choice of Clustering Measure 

The algorithm is defined independent of the choice of 
clustering measure. The term clust(G) in the algorithm 
above can be replaced by any clustering measure de- 
scribed in Section 2.1, or, more simply, the number of 
triangles in the graph. 



As shown in Figure 2, there are six types of triangles that 
can be added or removed for every pair of edges that are 
rewired. As illustrated in Figure Id, these additions and 
removals can occur in combination. 

Type A: The addition of the edge between vertices y\ 



uniformly select a random node, x, from the 
set of all nodes of G such that d x > 1. 

uniformly select two random neighbors, j/i 
and y 2 , of x such that d yi > 1 and 
d y2 > 1 and j/i ^ y 2 . 

uniformly select a random neighbor, z\ 
of yi and a random neighbor, z 2 of 
y 2 such that z\ ^ x, z 2 ^ x, 

z\ ^ z 2 . 

Gcand '■= G where G can d is the candidate 
graph to which the transition may be made. 

if {y\, y 2 ) and (zi, z 2 ) do not exist then 
Rewire two edges of G can d'- delete 
{yi,Z\) and (2/2,22), add (2/1,2/2) and 
{zi,z 2 ). 

end 

Update the value of clust(G can d) by measuring 
S(i) (and oj(i) if relevant) for the nodes involved 
in the rewiring and their neighbors. 

if clust(G cand ) > clust(G) and G can d 
is connected then 

G . G can( i 

end 



The choice of clustering measure does, however, affect 
the output of the algorithm. The clustering coefficient 
is a local measure; and thus G and G yield networks 
that are only locally optimized for the desired level of 
clustering. Also, as connectivity is required by our al- 
gorithm, the algorithm does not generate graphs which 
must be disconnected into multiple components to attain 
high levels of clustering. (An example of this is given 
in Appendix Figure 8). The algorithm may also have 
difficulty attaining target clustering values when using 
the standard clustering measures (G or T) because of 
joint degree constraints (the degrees of adjacent nodes) 
on the possible numbers of triangles, as with the exam- 
ple presented in Section 2.1. The Soffer-Vasquez clus- 
tering measures, which explicitly consider joint degree 
constraints, provide a way around this difficulty 11351 . 
Although the rewiring in our algorithm changes the joint 
degree distribution (and thus the degree correlations) of 
the graph, lo(G) is not altered significantly during net- 
work generation (as shown in Appendix Figure 9). Thus, 
when using G or T, clustering is increased primarily by 
the addition of triangles (that is, increasing 5(G)) rather 
than decreasing ui(G)). 



2.5. Analysis 
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Generated Network Type 


N 


< d > 


< d 2 > 


T 


f 


Diam 


r 


Q 


Vancouver Urban Contacts 


2627 


13.9 


265 


0.09[0] 


0.14 [0] 


6[0] 


0.15 [-0.4] 


0.28 [-0.15] 


WWW Subgraph 


4271 


4.2 


119 


0.03 [0.02] 


0.1 [0.02] 


15 [5] 


0.07[0.37] 


0.45[-0.15] 


Yeast Protein Interactions 


4713 


6.3 


152 


0.07[0.01] 


0.18 [0] 


12.5 [3.5] 


0.11 [0.07] 


0.39 [-0.1] 


Astro-Phys Collaborations 


5973 


4.1 


35 


0.26[-0.05] 


0.62[0] 


17 [-3] 


0.25 [-0.07] 


0.70[-0.1] 


US Air Traffic 


165 


38.0 


2765 


0.58[0] 


0.97 [0] 


3[0] 


-0.55 [0] 


0.11 [-0.01] 



Table 2: Comparisons between empirical networks and random graphs. For each empirical network, we generated 25 
random graphs constrained to have the observed degree sequences and Soffer-Vasquez transitivity values. The table 
reports average values of several network statistics for the random graphs: network size (N), mean degree ((d)), mean 
squared degree ((d 2 )), Soffer-Vasquez clustering coefficient (C), Soffer-Vasquez transitivity (T), maximum shortest 
path length between any two nodes (diam), degree correlation coefficient (r), and modularity (Q). The value given in 
brackets is the deviation of the ensemble mean from the corresponding statistic for the empirical network. (A positive 
relative deviation indicates that the ensemble mean was greater than the empirical statistic and vice versa.) Deviations 
are not listed for N, (d) and (d 2 ) as network size and degree sequence are constrained by our algorithm to match the 
empirical networks perfectly. 



and ?/2 guarantees the addition of one triangle in every 
rewiring event. 

Type B: The addition of the edge (2/1,2/2) could create 
new triangles with shared neighbors of y\ and 2/2. 

Type C: The addition of the edge {z%, Z2) could add a 
triangle if there existed edges between x and z\ and x 
and Z2- 

Type D: The addition of the edge between vertices z\ 
and Z2 could create new triangles with shared neighbors 
of Z\ and z 2 . 

Type E: The removal of edges (2/1, z\) and (2/2, z-i) re- 
moves one triangle each if the edges (x, z\) or (x, z^) 
exist. 

Type F: The removal of the edges between vertices 2/1 
and z\, and 2/2 and z^ could lead to the removal of ex- 
isting triangles with shared neighbors of j/i and z\ or 2/2 
and Z2- 

We note that although the type A addition is a special 
case of type B, the type C addition is a special case of 
type D, and the type E removals are a special case of 
type F, we distinguish them because they have differ- 
ent probabilities of occurrence. Our look-ahead strategy 
only allows rewiring moves when the total number of 
Type E and F losses is fewer than the total number of 
Type A, B, C, and D gains. 



2.6. Computational Complexity 



Like many MCMC methods, the algorithm we propose 
can be computationally expensive. The method outlined 
in Section 2.2 requires 0(M) steps to generate a con- 
nected graph, and up to 0(M) steps to randomize the 
graph, where M is the number of edges in the graph. 
At each step of randomization, we test that the graph re- 
mains connected (an 0{M) operation), resulting in an 
overall 0(M 2 ) network generation process. A naive 
computation of the transitivity/clustering coefficient re- 
quires checking every node for the existence of edges 
between every pair of neighbors of the node. This step 
requires 0(Nd 2 lax ) operations, where N is the number 
of nodes and d max is the maximum degree of any node 
in the graph. 

The most expensive step of our algorithm is the intro- 
duction of triangles via rewiring. A single rewiring step 
requires 0(M) operations for switching edges, check- 
ing for connectivity and updating the triangle count. Al- 
though we cannot calculate analytically the number of 
rewiring steps required to reach the desired transitiv- 
ity, we have found it empirically to be 0(M). Thus, 
the average complexity of the algorithm presented here 
is 0{M 2 ). This complexity has been computed for 
the most naive versions of our algorithms; and more 
efficient implementations may improve the complexity 
greatly. For example, we might improve efficiency by 
performing connectivity tests once every x rewirings 
(for some number x) rather than during every rewiring, 
as proposed in J9). 
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3. Results 



3.2. Structural Properties of Our Generated 
Networks 



3.1. Numerical Simulations 



To check the feasibility and reliability of the al- 
gorithm, we generated networks for several de- 
gree distributions and a range of clustering val- 
ues. Specifically, we used Poisson (pd = e~ x \ d /d\), 
exponential (jpd = (1 — e re )e _K; ( d_1 )) and scale-free 
(pd = rf~ 7 /C(7)) degree distributions, and Soffer- 
Vasquez transitivity (T) ranging from to 1 in incre- 
ments of 0.1. Figure 3 illustrates a graph (N=50) with 
a Poisson distributed degree sequence evolving towards 
higher transitivity. 

We evaluated the performance of the algorithm in com- 
parison to those proposed in BUI (as a representative of 
the growing networks class of clustered graphs gener- 
ators) and in l25ll (as a representative of the class of 
bipartite models). Specifically we measured the dis- 
crepancies between input and output degree distribu- 
tions (Figure 4, left graphs) and transitivity values (Fig- 
ure 4, right graphs). Our algorithm preserves the input 
degree sequence perfectly, while there are considerable 
mismatches between the input and output degree dis- 
tributions in the Volz and Newman models. For Pois- 
son and exponentially distributed graphs, our algorithm 
closely approaches the target transitivity. These degree 
distributions cannot, however, reach the highest transi- 
tivity values (Figures 4b and 4d) without disconnecting 
the graph. Unlike our algorithm, the Volz and Newman 
models do not require connectivity, which may explain 
the superior performance of the Volz algorithm on the 
Poisson network at the maximum transitivity value (Fig- 
ure 4b). The Volz algorithm also performs well at low 
values of T for both the Poisson and exponential net- 
works (Figures 4b and 4d); while the Newman algorithm 
only performs well on the Poisson networks. 

Our algorithm performs quite poorly on scale-free ran- 
dom graphs (Figure 4f), which have much higher clus- 
tering a priori than expected for Poisson random graphs 
l34ll28l . Our algorithm is not designed to decrease clus- 
tering, and therefore can only reach the desired level if 
the initial random graph has lower clustering than de- 
sired. The triangles in a connected scale-free random 
graph are also close to the minimum required to keep 
the graph connected, and thus, modifying our algorithm 
to decrease (as well as increase) the triangles in a graph 
would likely not improve its performance on the scale- 
free graphs. 



There are several other topological properties (besides 
degree sequence and transitivity) that can strongly in- 
fluence network function and dynamics: degree correla- 
tions (the dependence of a node's degree on its neigh- 
bor's degrees), community structure (groups of nodes 
that are highly intra-connected and only loosely inter- 
connected), and average path length (typical distances 
between pairs of nodes in the network). We have specif- 
ically developed this model to increase clustering with 
minimal structural byproducts. Thus, we confirm that 
we have reached this goal by measuring the above prop- 
erties in the networks generated by our algorithm. 

We evaluated the extent to which the algorithm intro- 
duces degree correlations by comparing random (un- 
clustered) graphs to clustered random graphs generated 
by our algorithm and the Volz 11401 and Newman l25ll 
algorithms (Figure 5). While our algorithm essentially 
preserves the correlation structure of the random graph, 
the other algorithms produce highly correlated graphs. 

Several authors have discussed the relationship between 
clustering and community structure ll28l 171 [32l [34l . As 
Figure 3 shows, the addition of triangles leads to mod- 
ular structure. This behavior is not surprising: as the 
number of edges in the graph is constrained, sets of con- 
nected nodes with high oj(i) values (often high-degree 
nodes) must be brought together to create additional 
clustering. 

Short average path lengths are a characteristic feature of 
random graphs (23). To quantify the impact of our al- 
gorithm on path lengths, we calculated the average path 
length for each node to all other (TV — 1) nodes, and then 
compared the distributions of these values for several 
random and random clustered graphs (Figure 6). While 
our algorithm preserves short average path lengths, the 
mean of the path length distribution tends to be slightly 
larger for the clustered graphs than for the correspond- 
ing random graphs. The intuition behind this increase in 
average path length may lie in the increased community 
structure: as graphs become more clustered and separate 
into subgroups, nodes in different groups require more 
links to reach each other. 

3.3. Comparison to Empirical Networks 

Our algorithm can also be applied to detecting non- 
random structure in empirical networks. We can gener- 
ate ensembles of clustered random networks with empir- 
ically estimated degree distributions and clustering val- 



ues to ascertain whether empirical networks have signifi- 
cant non-random structure in other respects. We demon- 
strate this application using representatives from five dif- 
ferent classes of real networks: (1) a social network, 
made up of contacts between individuals in the city of 
Vancouver lfl8l . (2) a protein interaction network for 
yeast l38l . (3) a technological network, made up of a 
subset of the links of the World Wide Web HT7|, (4) a 
transportation network, made up of US metropolitan ar- 
eas connected by air travel Q, and (5) a collaboration 
network, made up of scientists connected by coauthor- 
ship on scientific preprints on the Astrophysics E-Print 
Archive between 1995 and 1999 11221 . with a collabora- 
tion strength of 0.5 or greater J2T). The basic statistics 
of these networks, including clustering values, are listed 
in Table 1 . 

We used the following method to quantify deviations 
from randomness in these networks. First, we used 
our algorithm to generate 25 clustered random networks 
constrained to match the empirical degree distribution 
and clustering values. Second, we selected a set of net- 
work topological measures (other than degree distribu- 
tion and clustering), and compared these quantities for 
the empirical graph to the corresponding average quan- 
tities across the ensemble of generated graphs. 

Specifically, we generate 25 random clustered networks 
for each empirical network, constrained to match the 
empirical degree sequence and Soffer-Vasquez transitiv- 
ity. In addition to the degree and clustering metrics, we 
also calculated diameter (longest shortest path length be- 
tween any pair of nodes in the graph) |[T3l . degree cor- 
relation coefficient l24l and modularity (degree of com- 
munity structure) l27ll (Table 2). Other than diameter, 
each of these range from to 1 . The standard deviations 
for all statistics are negligible and thus not reported. For 
every statistic, we also give the deviation between the 
empirical value and the average across the generated en- 
semble of random clustered networks (specifically, de- 
viation = ensemble mean - observed value). Small devi- 
ations suggest that the empirical network structure boils 
down to the degree distribution and clustering, and thus 
we turn our attention to possible mechanisms underlying 
these properties. In contrast, large deviations suggest 
that there are other fundamental properties to consider 
in addition to or, perhaps, instead of clustering. 

The random counterparts of the US air traffic network, 
for example, have structural properties almost identical 
to the real network, suggesting that the structure of the 
US air traffic network comes almost exclusively from 
its degree patterns. (In fact, even the high clustering is 
explained exclusively by the degree patterns.) We note 



that the US air traffic network is the most engineered 
of the networks we consider, and thus may have fewer 
emergent properties. The remaining empirical networks 
differ considerably from their random counterparts, sug- 
gesting that there are important mechanistic features not 
captured in our random model. For example, the two 
social networks (the Vancouver urban contact network 
and the Astro-Phys collaboration network) have higher 
degree assortativity than our random networks. This 
may point to rules of social behavior beyond that dic- 
tated by number of "friends" and the tendency that "my 
friend's friend is also my friend." All the natural net- 
works also have significantly higher community struc- 
ture than the corresponding random networks, inspite of 
having a wide range of transitivity values. This shows 
that clustering and community structure are not neces- 
sarily postively correlated. 



4. Conclusion 

In this work, we have introduced a Markov chain sim- 
ulation algorithm to generate clustered random graphs 
with a specified degree sequence and level of cluster- 
ing. Our algorithm perfectly preserves the degree se- 
quence of a random graph and generally maintains other 
fundamental properties of random graphs like short path 
length and low degree correlations. An ensemble of the 
graphs generated by this algorithm can thus be useful 
for systematically studying the impact of triangles on 
network function and dynamics and understanding iden- 
tifying the essential structural features of empirical net- 
works. Since this method is based on a dynamic pro- 
cess, it can be used to generate both static networks with 
a specified amount of clustering and dynamic networks 
with evolving levels of clustering. Furthermore, since 
the process is a "memoryless" one, additional clustering 
can be added to any network without having to grow a 
new one from scratch. 
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not effective in achieving the desired levels of cluster- 
ing. Additionally, other structural properties of the net- 
work, e.g. degree correlations, are significantly altered 
from the orginal graph in this case (as shown in the right 
panel of Figure 7.) 

Figure 8 illustrates a network in which disconnection is 
required to achieve maximal clustering. 

Figure 9 shows that our algorithm does not change the 
number of possible triangles (w(G)) in the graph drasti- 
cally. 



Appendix 

We evaluated the effectiveness of an algorithm which ac- 
cepts all rewirings regardless of their effect on the num- 
ber of triangles. Recall that our main algorithm only 
makes rewirings that increase the number of triangles. 
In Figure 7, we show that the permissive algorithm is 
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Figure 3: The evolution of the graph from (a) T w ,(b) T = 0.1,(c) T = 0.5 and (d) T = 0.8 HO) 
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Figure 4: Discrepancies between input and output degree distributions (left panels) and transitivity values (right panels) 
for an ensemble of 15 Poisson (top panels), exponential (middle panels) and scale-free graphs (bottom panels) as 
generated by our algorithm and the algorithms presented in 14011 and 1251 . Each graph has N = 500 and mean degree, 
(d) = 5. The input degree distribution is shown as a gray solid line (left graphs); and output degree distributions are 
not shown for our algorithm as they always perfectly match the input. The input and output transitivity values are 
measured as T for our algorithm, and as T for the Volz and Newman algorithm. 
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Figure 5: Degree correlations in random graphs with specified degree distributions (Poisson, exponential and scale- 
free with mean degree = 5) compared to clustered random graphs with the same degree distributions and T = 0.5 
generated by our algorithm and the Volz l40l and Newman ll25l algorithms. The graphs present averages over 15 
graphs generated by each algorithm. Our algorithm introduces fewer degree correlations than the alternatives. 
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Figure 6: Average path lengths in random graphs with specified degree distributions (Poisson and exponential with 
mean degree = 5) compared to clustered random graphs with the same degree distributions and T = 0.5 generated 
by our algorithm. The histograms are based on 15 networks of each type. The clustered graphs have slightly higher 
means than their random counterparts: 4.05 for the Poisson random graphs verus 4.39 for the clustered graphs; and 
3.95 for the exponential random graphs versus 4. 14 for the custered graphs. 
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Figure 7: The effect of allowing both uphill (rewirings that increase the total number of triangles) and downhill 
(rewirings that decrease the total number of triangles) moves. These results are shown for a Poisson distributed graph 
of 500 nodes. In the left panel, we see that allowing all rewirings is not effective in reaching the desired transitivity 
(T = 0.45). We also find that the structure of the graph is altered significantly in the process of making all rewirings. 
The degree correlation coefficient, for example, varies significantly with each rewiring (as shown in the right panel. 
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Figure 8: (a) A random graph with 10 nodes, each of degree 4. (b) The graph in (a) must be disconnected to be 
maximally clustered (C = T = C = f = 1). 
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Figure 9: The numbers of triangles (5(G) and possible triangles u>(G) in the graph as the algorithm progresses. w(G) 
does not vary substantially during graph generation. These results are for a Poisson distributed graph of 500 nodes, 
to which triangles are added until reaching (a) Soffer-Vasquez transitivity = 0.5 and (b) Soffer-Vasquez clustering 
coefficient = 0.5. 
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